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SUMMARY 


We performed a series of experiments examining the effect of contrast on the perception of moving 
plaids to test the hypothesis that the human visual system determines the direction of a moving plaid in a 
two-staged process: decomposition into component motion followed by application of the intersection of 
constraints rule. Although there is recent evidence that the first tenet of the hypothesis is correct, i.e., 
that plaid motion is initially decomposed into the motion of the individual grating components, the 
nature of the second-stage combination rule has not yet been established. We found that when the grat- 
ings within the plaid are of different contrast the perceived direction is not predicted by the intersection- 
of-constraints rule. There is a strong (up to 20°) bias in the direction of the higher-contrast grating. A 
revised model, which incorporates a contrast-dependent weighting of perceived grating speed as has 
been observed for one-dimensional patterns, can quantitatively predict most of our results. We then 
discuss our results in the context of various models of human visual motion processing and of physio- 
logical responses of neurons in the primate visual system. 


INTRODUCTION 


Deducing the direction of motion of a pattern as a whole from the motion of oriented components 
within that pattern is a challenge for all models of human visual motion processing. Adelson and 
Movshon (1982) studied the problem using moving plaids, the sum of two drifting gratings of different 
orientations. They proposed that the human visual system determines the direction of a moving pattern 
using a two-step procedure: first, the velocities of oriented components within the pattern are estimated, 
then at a later stage they are recombined to calculate the motion of the pattern as a whole. 

Their hypothesis was formulated to explain the psychophysical finding that, in order for the two 
components to cohere (to move together as a plaid), the gratings must be similar in spatial frequency. 
They concluded from this finding that the human visual system analyzes plaid motion by first decom- 
posing it into the motion of the grating components (fig. 1(a)). They suggested that this decomposition is 
the natural consequence of having orientation and spatial-frequency tuned sensors at the front end of the 
system (for a review, see DeValois and DeValois, 1980). They also proposed that, at the second stage 
(fig. 1(b)), the component velocities are recombined using the intersection of perpendicular constraints 
(Fennema and Thompson, 1979) to yield a measure of the motion of the plaid as a whole. 

When plaid motion is plotted in velocity space (fig. 1(b)), the motion of each grating component 
within the plaid is ambiguous: consistent with a family of velocities lying along a constraint line (thick 
lines). Adelson and Movshon proposed that plaid velocity is recovered as the unique vector defined by 
the intersection of both constraint lines. The plaid-velocity vector is thus consistent with the rigid motion 
of both of the individual gratings and is therefore a measure of the motion of the coherent plaid. The lack 
of coherence for gratings of widely differing spatial frequencies was explained by assuming that the 
second stage only combines information from sensors with similar spatial-frequency tuning. 

They found support for their hypothesis in the discovery of two types of motion-sensitive neurons in 
the monkey visual cortex: one sensitive to component motion and one, at a higher level within the 
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Figure 1- The Adelson-Movshon model, (a) Basic two- stage framework where plaid motion is decom- 
posed into the motion of the grating components, then reconstructed at a second stage, (b) The inter- 
section of constraints rule is demonstrated by showing the motion of a plaid in velocity space. The 
direction of plaid motion (a), a function of the speed of the grating components (Vi and V 2 ) and of 
the plaid angle (0), is given by equation (4). 
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cortex, sensitive to the motion of the plaid as a whole (Movshon et al., 1986). Furthermore, a recent 
study has found that speed discrimination for moving-plaid stimuli is consistent with the two-staged 
approach that Adelson and Movshon proposed (Welch, 1989). However, the second-stage recombination 
rule proposed by Adelson and Movshon has recently been challenged (Ferrera and Wilson, 1988, 1989). 

In this study, we extend the investigation of how the brain determines the direction of motion. In 
particular, we examine the effect of contrast on the perceived direction of a moving plaid. It has been 
shown that the perceived speed of a single grating is a function of contrast (Thompson, 1982). At tempo- 
ral frequencies below 8 Hz, a low-contrast grating appears to move more slowly than a standard higher- 
contrast grating moving at the same physical speed. If this contrast-dependent distortion in the perceived 
speed of the components is passed on to the second stage of the model in figure 1, then a significant 
contrast-dependent distortion in the perceived direction of motion of the plaid as a whole should result. 
We confirmed this prediction: making the contrast of the individual components within the plaid 
unequal results in errors in judgment of direction. The perceived direction of motion can differ by 
up to 20° from that predicted by the model in figure 1. 

We therefore propose a revised model that incorporates Thompson’s finding as a contrast-dependent 
distortion of component speed. Because the proposed contrast dependence in the revised model is a 
function of the component contrast in threshold units, we first measured the detection threshold of 
moving gratings in the presence of a moving grating mask of different orientation, with the geometric 
arrangement the same as that of the plaid stimuli. Simulations of the revised model show that, in most 
cases, if the contrast-distorted estimate of grating speed, rather than the true grating speed, is the input to 
the second stage of processing, then the observed errors in perceived direction can be quantitatively 
predicted. Preliminary results have been presented elsewhere (Stone, Mulligan, and Watson, 1988a,b). 

This research was partially supported by a National Research Council associateship to Leland Stone. 
The authors thank John Perrone, J. Mattei Valeton, and Al Ahumada for their comments on an earlier 
draft; Bill Paulsen for his help with figures 1 and 9; and John Maunsell for graciously providing electro- 
physiological data. 


GENERAL METHODS 


The stimulus used in this study was a vignetted plaid, the sum of two sinusoidal gratings of different 
orientations viewed through a two-dimensional (2-D) Gaussian window. Figure 2 shows an example of 
such a stimulus. We generated moving plaids on a Mitsubishi 19-in. high-resolution monochrome 
monitor (model M-6950) using an Adage RDS 3000 image display system. The luminance output of 
the monitor was calibrated and corrected for its gamma nonlinearity using a lookup table procedure 
described elsewhere (Watson et al., 1986). A detailed account of the animation procedure that was used 
to generate moving plaids can be found in Mulligan and Stone (1989). 

Briefly, the plaid stimulus was a 512 pixel by 512 pixel 8-bit/pixel image created using both locally 
developed programs and the HIPS image-processing software package (Landy, Cohen, and Sperling, 
1984). First, four 2-D sinusoidal gratings were generated (sine- and cosine-phase components of gratings 
with two different orientations symmetric with respect to the vertical axis). These four images were then 
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Figure 2 - The standard plaid stimulus. The two gratings were 1.5 c/d, oriented 60° symmetrically from 
vertical, and viewed through a Gaussian window. The halftoning for this figure was not that used for 
the actual stimulus. For a detailed discussion of the halftoning used to display our stimuli see 
Mulligan and Stone (1989). 


multiplied by a 2-D Gaussian (x and y standard deviations of 90.5 pixels). This procedure eliminated 
the sharp edges at the boundaries of the stimulus. The images were then halftoned using a modified 
error-diffusion method (Floyd and Steinberg, 1975; Mulligan, 1986). The resulting four bit-mapped 
images were then loaded into the four lower-order bit-planes. A 3 pixel by 3 pixel white fixation cross 
was drawn into a fifth bit-plane in the center of the image. The remaining three bit-planes were blank. 
The image could be loaded into the framebuffer within a few seconds. Then, by varying the color lookup 
table on a frame-by-frame basis (at 60 Hz), we modulated the contrast of the sine- and cosine-phase 
components of each grating in temporal quadrature so that they appeared as a single drifting grating. In 
this way, we had complete control over the speed and contrast of both gratings within the plaid without 
having to load new images into the framebuffer. Furthermore, the initial spatial phases of the gratings 
within the plaid were randomized so that position cues could not be used to assess motion. 
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There were small, but measurable, departures from linearity of spatial summation in our display 
monitor, which conflict with one of the basic assumptions underlying halftoning techniques. However, 
using a technique described elsewhere (Mulligan and Stone, 1989), for a stimulus of 40% total contrast, 
we estimated the contrast of the largest artifact to be less than 0.2% and, in particular, those artifacts 
harmonically related to the stimulus were even smaller. 

The standard plaid stimulus consisted of two 1.5 cycle/degree (c/d) gratings whose normal vectors 
were oriented symmetrically ±60° from the vertical axis (fig. 2). We defined the plaid angle as the angle 
between the normal vector defining each grating and the bisecting axis or half the angle between the 
normal vectors (0 in fig. 1(b)). It was therefore 60° for the standard plaid. This arbitrary definition was 
chosen because it simplifies the equations presented below. The grating contrasts were 10% each. For a 
pair of sinusoidal gratings, the total contrast is simply the sum of the grating contrasts or 20%. The 
speed of the coherent plaid was held constant at 2°/sec. In some experiments, the spatial frequency was 
0.75 or 3.0 c/d, the plaid angle was 45° or 30°, the total contrast was 5, 10, or 40%, and the plaid speed 
was increased to 6.0°/sec. 

Subjects viewed the screen binocularly through natural pupils from a distance of 273 cm. This 
distance made the image subtend 5.4° by 5.4° (20 pixels/cm) and made the high-frequency halftoning 
noise invisible except at the highest contrast level (40% total contrast). The mean luminance of the 
image was 100 cd/m 2 . The stimulus presentation lasted 300 msec. The contrast rose with a Gaussian 
time course (standard deviation of 0.71 frames), reached full contrast after 50 msec (3 frames), stayed at 
full contrast for 200 msec, then fell with the same Gaussian time course during the final 50 msec. We 
used four male observers (three of whom were unaware of the purpose of the experiment) between 16 
and 30 years old. 


Experiment 1: Detection Threshold for Moving Plaid Components 

Before performing the main experiment of this study, we measured the detection threshold of our 
subjects for each of the components within the plaid. This was to ensure that both gratings within the 
plaid were above threshold in experiment 2 and to convert absolute contrast into threshold units which 
were needed for the simulations. 

Methods- We determined the threshold for detecting the presence of a moving sinusoidal grating 
(signal) in the presence of a second moving grating (mask) of higher contrast and different orientation 
using an unconventional procedure: we held total contrast (mask plus signal) constant (at 5, 10, 20, or 
40%) to allow easy comparison with the data from experiment 2. The signal and mask were both 1.5 c/d 
sinusoidal gratings oriented either +60° or -60° from the vertical axis and moving at l°/sec. The choice 
of which of the two gratings was signal and which was mask was made randomly before each trial. 
Threshold was determined using a two-interval, forced-choice protocol. The signal contrast level was 
chosen from a finite set which varied from 2.5% to 0.025% in fifteenth of a log unit steps. A trial con- 
sisted of two stimulus intervals (300 msec each separated by a 500-msec blank interval) presented in 
random order: one in which both signal and mask were present at a fixed total contrast and another in 
which only the mask was present. Thus, although the mask varied from trial to trial, it was identical in 
both intervals within a single trial. The signal contrast on a given trial was determined by one of two 
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independent, randomly interleaved staircases. Within each staircase, contrast was reduced after two 
correct responses and increased after a single incorrect response. 

Subjects were instructed to watch the screen and to fixate the small cross which appeared for 
500 msec immediately before the onset of each stimulus, and was extinguished while the moving stimu- 
lus was displayed. They were then asked to indicate whether the signal was in the first or second inter- 
val. The resulting proportion correct (P) versus signal contrast (x) was fit with the best-fitting Weibull 
function (Watson, 1979; Weibull, 1951) 

P = minf 0.99, 1-0.5 e~ (x/T)3 ' 5 l (1) 


where T is detection threshold. Thus, threshold was defined as the signal contrast at which the observer 
distinguished correctly 82% of the time between a weak “signal” grating moving within a moving plaid 
(i.e., in the presence of a “mask” grating) and the moving mask grating alone. 

Results- Figure 3 plots logio threshold contrast (T) as a function of logio mask contrast (M). The 
data for all four subjects have a flat region below some critical mask contrast followed by a linear rising 
slope. This result is similar to the detection threshold results of Legge and Foley (1980) for pairs of sta- 
tionary gratings of different spatial frequency. To quantify the results, we did a simple piecewise linear 
fit. For the flat portion, we made the assumption that the mean threshold at 5% total contrast (leftmost 
data points in fig. 3), was the mean unmasked threshold (c). To measure the linear rising phase, we fit 
the three clusters of points generated for total contrasts of 10% and higher using linear regression to 
determine the slope (a) and intercept (b). Although it is arbitrary to include the points generated at 10% 



log 10 MASK 

Figure 3 - Detection threshold versus mask contrast. This log-log plot contains the thresholds of all four 
subjects at four different total contrasts (5, 10, 20, and 40%). The solid line is the fitted curve used 
for the simulations and is given by equation (2). 
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total contrast, any resulting error is probably small. The mean curve, the solid line in figure 3, is given 
by the following equation: 


T = max(lO (alogM - b) .c) 


( 2 ) 


with a = 0.548, b = 1.526, and c = 0.0078. Equation (2) is merely a power law with an exponent of 
0.548 for mask contrasts above 8.6%. The exponent found here is similar to that for stationary grating 
masks of different spatial frequency (range: 0.50 to 0.79 in Legge and Foley, 1980) and of different ori- 
entation (range: 0.40 to 0.72 in Phillips and Wilson, 1984). (We will use eq. (2) to estimate threshold for 
the simulations in fig. 1 1 .) 

For two subjects, we measured the effect of temporal and spatial frequency on detection threshold 
(fig. 4). Temporal frequency (speed) had little effect on threshold (fig. 4(a)). However, threshold was 
very sensitive to changes in spatial frequency (fig. 4(b)): it increased at lower spatial frequency 
(0.75 c/d) and decreased at higher spatial frequency (3.0 c/d). This is consistent with previous studies of 
human spatio-temporal contrast sensitivity (Robson, 1966; Koenderink and van Doom, 1979). (The 
thresholds for these two observers at these temporal and spatial frequencies, given by eq. (2) with a,b, 
and c as shown in table 1, were used for the simulations in figs. 12 and 13.) 

Experiment 2: Effect of Contrast on the Perceived Direction of Plaid Motion 

These experiments were conducted to measure systematically the effect of the relative contrast of the 
two gratings within the moving plaid on the perceived direction of motion. They were designed to test 
the model shown in figure 1 which predicts that changes in contrast will have no systematic effect on the 
perceived direction of plaid motion. 

Methods- Subjects were presented with a single stimulus interval and were asked whether the 
stimulus moved to the left or right of subjective vertical. The true direction of plaid motion was varied 
by making the appropriate change in the ratio of the speeds of the two gratings (speed ratio) while the 
speed of the coherent plaid was held constant at 2°/sec. The direction was changed within two inter- 
leaved up-down staircases to determine the direction for which subjects chose left or right with equal 
probability. We call this point perceived vertical and, for simplicity, we express it in degrees with 
respect to true vertical. 

While total contrast was held constant at 5, 10, 20, or 40%, the ratio of the contrasts of the two grat- 
ings (contrast ratio) was varied in steps of V2 . For example, the possible contrast pairs with 40% total 
contrast are: 20%, 20%; 23.4%, 16.6%; 26.7%, 13.3%; 29.6%, 10.4%, etc., and the symmetric counter- 
parts. The series of stimuli included all possible contrast ratios which were powers of V2 and for 
which both gratings were above detection threshold. Because these series contained so many conditions, 
they were split into two interleaved subseries: one with contrast ratios which were even powers of V2 
and another with odd powers of V2 . The two subseries were presented in separate sessions. 

For example, if the contrast of one of the gratings is so low that its perceived speed is half that of its 
true speed and if perceived rather than true speed feeds into the second stage of the model in figure 1 , 
then the intersection of constraints rule will yield a severe directional bias toward the direction 
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Figure 4.- Detection threshold versus mask contrast. These plots contain the thresholds of two subjects 
at four different total contrasts at two different temporal frequencies (a) and three different spatial 
frequencies (b). The solid curves are given by equation (2) using the parameters shown in table 1. 


TABLE 1.- THRESHOLD PARAMETERS DEPEND ON TEMPORAL 
AND SPATIAL FREQUENCY 


Spatial 

frequency 

Temporal 

frequency 

a 

slope 


c 

unmasked 

threshold 

0.75 

1.5 

0.536 

— 

mm 

1.5 

1.5 

0.554 


BBS 

3.0 

1.5 

0.448 

■MM 

■&■ 

1.5 

4.5 

0.553 

1.497 

0.0078 
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of motion of the grating of higher contrast (fig. 5(a)). To quantify this bias, the true direction of plaid 
motion is altered by varying the speed ratio of the components. When the speed ratio reaches 2 to 1, the 
plaid will appear to move straight up (fig. 5(b)). Perceived vertical measured this way (bias in fig. 5(b)) 
is equal and opposite to the bias seen when the plaid is actually moving straight up (bias in fig. 5(a)) 
assuming the contrast-dependent distortion is a multiplicative speed distortion, which is independent of 
temporal frequency. 


/ 



Figure 5.- Measuring the directional bias of a plaid moving straight up but composed of unequal con- 
trast gratings, (a) If Thompson (1982) is correct, the lower-contrast grating (grey) will appear to 
move more slowly and the intersection-of-constraints rule applied to the perceived grating speeds 
will predict a bias toward the direction of motion of the higher-contrast grating (black), (b) If the 
speed ratio is changed until the plaid is perceived to move straight up, then the true direction of plaid 
motion will have an equal and opposite bias to that in (a). 

Our staircase method yielded typical psychometric curves (fig. 6). We fit the data for each condition 
with a cumulative Gaussian using a weighted least-squares procedure (Mulligan and MacLeod, 1988) 
based on probit analysis (Finney, 1971). The standard deviation of the best-fitting cumulative Gaussian 
was defined as the precision in the observer’s direction judgments. The location of the inflection point 
represents a bias which we refer to as the perceived vertical (the direction of motion that is perceived as 
pure vertical). 
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Figure 6 - Raw psychometric curves for plaid direction discrimination. This figure plots the percentage 
of stimulus presentations that were perceived as leftward versus the true direction of motion of the 
stimulus, for a single subject, for three runs at the three different contrast ratios indicated above the 
curves. Positive angles indicate leftward motion. 


Results- For the standard plaid with a contrast ratio of 1 , table 2 shows the mean precision of four 
subjects averaged over three runs and the standard deviation. Observers were apparently able to deter- 
mine the direction of plaid motion to around ±4°. Although there seems to be some idiosyncratic vari- 
ability, on average, there is no bias in the mean perceived vertical which indicates that there was little or 
no systematic bias. 


TABLE 2.- PRECISION FOR STANDARD PLAID 
WITH CONTRAST RATIO OF 1 


Subject 

Perceived vertical, 
deg 

Precision, 

deg 

L.S. 

-0.7 ±1.3 

4.0 ±1.0 

L.L. 

0.5 ±0.4 

3.4 ±0.4 

E.P. 

-3.4 ±3.7 

5.5 ±1.6 

C.F. 

4.0 ±2.6 

2.7 ±0.7 

Mean 

0.1 

4.0 


Figure 6 shows raw data for naive observer C.F. The psychometric curves shift along the x-axis for 
different contrast ratios: perceived vertical goes from 14.9° rightward for a contrast ratio of 0.125, to 
1.1° rightward at equal contrast, and finally to 10.3° leftward for a contrast ratio of 8. However, the 
precision for the three conditions remained nearly unchanged at 3.4°, 3.4°, and 2.5°, respectively. This 
illustrates, at the raw-data level, that there are systematic changes in perceived vertical that occur as a 
function of contrast ratio and which cannot be explained by changes in precision. 
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A complete analysis of the performance of all four subjects shows that varying the contrast ratio 
away from 1 produced large distortions in the perceived direction of motion. "Figure 7 plots perceived 
vertical in degrees away from true vertical as a function of log2 contrast ratio at total contrasts of 5, 10, 
20, and 40% for all four subjects. (Typical standard deviations are plotted for 40% contrast.) When the 
gratings were of unequal contrast, the perceived direction of motion was shifted toward that of the 
higher-contrast grating. The effect increased systematically with increased contrast ratio, although it 
varied for different total contrasts. All four subjects showed the same qualitative behavior. 





Figure 7- Perceived vertical versus contrast ratio, (a)-(d) This figure plots perceived vertical for all four 
subjects at four different total contrasts (5, 10, 20, and 40%). Error bars indicate standard deviations. 
Positive values indicate biases to the left of actual vertical. 

The precision of the direction judgments was insensitive to changes in contrast except at extreme 
contrast ratios. Figure 8 plots precision as a function of log 2 contrast ratio at four different total con- 
trasts for all four subjects. Although subjects varied in their overall sensitivity to the direction of motion, 
there were no systematic effects of contrast ratio on precision. 
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Figure 8.- Precision versus contrast ratio, (a)-(d) This figure plots precision for all four subjects at four 
different total contrasts (5, 10, 20, and 40%). Note that the scale is gready amplified as compared to 
figure 7. 


A Revised Model 

The Adelson-Movshon model, as shown in figure 1, fails to explain the results of experiment 2, 
because it tacitly implies that the speeds of the components are accurately determined regardless of 
contrast. In this section, we amend the model to incorporate the finding of Thompson (1982) that the 
perceived speed of moving gratings is a function of contrast. The revised model is then tested with a 
variety of moving plaid stimuli. 

Theory- Figure 9 shows the revised model. The modification is that the second stage is passed a 
contrast-distorted version of grating speed (for convenience, hereafter called perceived speed) rather 
than actual grating speed. We construct perceived speed by multiplying actual speed by f, the contrast- 
dependent weighting function. For each grating, f is a function of the grating’s contrast in threshold 
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Figure 9 - A revised model. A contrast-dependent nonlinearity is added to each channel in the Adelson- 
Movshon model. The nonlinearity is a function of the contrast in threshold units of the input grating 
for the particular channel. Because threshold will be altered in the presence of the other grating 
(masking), the nonlinearity actually becomes a function of the contrast of both gratings. 

units (Ct), determined by dividing absolute contrast by threshold calculated using equation (2) with the 
other grating acting as the mask. Figure 10 shows the contrast-dependent weighting function that we 
used for the simulations that follow. We chose a Weibull function (Weibull, 1951) which goes to zero at 
threshold and which rises rapidly to just about 1 for contrasts exceeding 10 threshold units. The explicit 
formula was: 


f = l-e 
= 0 


f c-r-i 

\ ki 


k 2 


for C'j’ 2 1 
for C T < 1 


( 3 ) 


with kj = 1.99 and k2 = 0.76 (by a least-squares fit of the data in fig. 11). 
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Figure 10 - The contrast-dependent weighting function. This rapidly saturating function given by equa- 
tion (3) was derived by choosing ki and k2 such that the mean squared error between the simulated 
and actual data was minimized. 


Once f is determined for each grating, human performance can be simulated. The simple 
intersection-of-constraints rule illustrated in figure 1(b) predicts that the perceived direction of motion 
(a) is given by the following equation (Stone, 1988): 


a = arctan 



(4) 


where Vj and V 2 are the speeds of the two grating components and 0 is the plaid angle. Note that, 
when Vi = V 2 = V, a is zero and the plaid is perceived to move straight up. If, however, perceived 
rather than true speed is the input to the intersection-of-constraints stage and the perceived speed of the 
i l h grating is fjV, then the perceived direction of motion when the plaid is actually moving straight up is 


a = arctan 


cotan 0 



(5) 


Equation (5) allows us to simulate human perception of the direction of motion and to compare the result 
with the data in figure 7. 

Results — Figure 1 1(a) plots the average bias of the four observers as a function of log2 contrast 
ratio for four different total contrasts by condensing the data presented in figure 7. Because the perfor- 
mance of all four subjects was qualitatively the same and because there is an inherent symmetry in the 
series of contrast ratios (i.e., the curves in fig. 7 are nearly antisymmetric), we collapsed the data over 
symmetric pairs of data points and averaged it over all subjects. We defined the contrast-dependent bias 
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for a given contrast ratio (and its inverse) as the difference in perceived vertical for symmetric contrast 
ratio pairs divided by two. Figure 1 1(b) plots the simulated output of the model (using ecjs. (2), (3), 
and (5)) under the same conditions. Using the simplex method of nonlinear curve fitting (Press et al., 
1988), we selected the only two floating parameters of the model (ki and k 2 of eq. (3)) such that simula- 
tions of equation (5) optimally (least- squared error) fit the data shown in fig. 1 1(a). Both the actual and 
simulated data show systematic shifts in the perceived direction of motion up to about 20° toward the 
direction of the higher-contrast grating. 



log 2 CONTRAST RATIO «og 2 CONTRAST RATIO 

Figure 11.— Simulated versus actual bias, (a) Plot of the same data as that in figure 7 averaged over sub- 
jects and over symmetric contrast-ratio pairs, (b) Simulations of the model in figure 9 under the same 
conditions as (a). 

The model in figure 9 also qualitatively predicts the effect of changing the spatial and temporal 
frequency of the stimulus. Figure 12(a) plots the average bias of two observers as a function of log2 
contrast ratio at three different spatial frequencies. Figure 12(b) plots the output of our model under the 
same conditions. The model qualitatively predicts that the bias will be larger at 0.75 c/d and smaller at 
3.0 c/d. This prediction results from the changes in threshold as a function of spatial frequency 
(fig. 4(b)): changes in the threshold parameters (table 1) produce changes in Ct (through eq. (2)) which 
via equations (3) and (5) yield changes in the simulated bias. There is, however, a significant quantita- 
tive discrepancy between the data and the simulations. Specifically, at 3.0 c/d, there is a small decrease 
in threshold so Ct is slightly larger and therefore f is slightly closer to 1 . This leads to a small decrease 
in the simulated bias. However, there is a large decrease in the actual bias seen by our two observers. 
Similarly, at 0.75 c/d, although there is a large increase in threshold and, therefore, a large increase in 
the simulated bias, there is only a small increase in the actual bias seen by our two observers. 

Increasing the temporal frequency had little effect on the perceived direction of plaid motion. Fig- 
ure 13(a) shows the average data for the same two observers at mean temporal frequencies of 1.5 Hz and 
4.5 Hz (plaid speeds of 2 and 6°/sec). Figure 13(b) shows the simulation of the model under the same 
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Figure 12.- Effect of spatial frequency, (a) Plot of the bias of two subjects at three spatial frequencies 
averaged over symmetric contrast-ratio pairs, (b) Simulations of the model in figure 9 under the 
same conditions as (a). 



Figure 13.- Effect of temporal frequency, (a) Plot of the bias of two subjects at two temporal frequencies 
averaged over symmetric contrast-ratio pairs, (b) Simulations of the model in figure 9 under the 
same conditions as (a). 

conditions (using table 1 and eqs. (2), (3), and (5)). Temporal frequency changes had little effect on 
threshold (fig. 4(a)) and therefore had little effect on the simulated bias. 
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Our revised model does not appear robust to changes in the plaid angle. When the plaid angle is 
decreased to 30°, some subjects show a bias toward the direction of motion of the lower-contrast grating. 
Figure 14 shows the biases of two different observers at three different plaid angles. Decreasing the 
plaid angle affected the two subjects differently. The subject in figure 14(a) showed a reduced bias for a 
plaid angle of 45° and, at 30°, a reversal of the bias toward the motion of the lower-contrast grating. The 
model in figure 9 does not predict this reversal. However, the subject in figure 14(b) did not show this 
reversal. 



log 2 CONTRAST RATIO log 2 CONTRAST RATIO 

Figure 14 - Effect of plaid angle, (a)-(b) Plots of the biases of two subjects at three different plaid 
angles. 


DISCUSSION 


Perception of Motion for Unequal Contrast Plaids 

The results presented here and recent results by others (Kooi, DeValois, and Wyman, 1988) clearly 
show that the simple intersection-of-constraints rule model proposed by Adelson and Movshon (1982) 
cannot account for the perceived direction of motion of plaids when the components are of unequal 
contrast. However, a simple modification of their model, which takes into consideration the fact that the 
perceived speed of a moving grating is dependent on its contrast (Thompson, 1982), can, in most cir- 
cumstances, account for the perceived direction of a moving plaid. 

The modified model is robust in that it qualitatively predicts changes (or lack thereof) in perceived 
direction as a function of spatial and temporal frequency. The quantitative discrepancy between the pre- 
dicted and actual effect of changing spatial frequency (fig. 13) can be explained if one postulates that f, 
the contrast-dependent weighting function, is itself a function of spatial frequency. In all of our simula- 
tions, f was defined by equation (3) as determined by fitting the data for a 1.5 c/d stimulus. If one allows 
the two floating parameters to vary with spatial frequency, one can quantitatively account for the data in 
figure 13(a). 
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Our results cannot be explained by incoherent plaid motion. Our standard stimulus (with a plaid 
angle of 60°) subjectively appeared to move coherently for all subjects even at contrast ratios as high as 
8V2 . This is not inconsistent with previous observations on coherence (Movshon et al., 1986). Further- 
more, an objective measure of the coherence of our standard stimulus is the fact that the precision of the 
direction judgment is nearly independent of contrast ratio (fig. 8). 

An important caveat when interpreting our results and those of others (Adelson and Movshon, 1982; 
Movshon et al., 1986; Ferrera and Wilson, 1988, 1989; Kooi et al., 1988; Welch, 1989), is that moving 
plaids are strong stimuli for optokinetic eye movements and that eye movements may contribute to the 
perception of plaid motion. In this study, the brief stimulus duration (300 msec) makes it unlikely that 
eye-movement contamination dominates the percept. 

The fact that small plaid angles can cause a bias in the direction of the low-contrast grating cannot be 
explained by our revised model. However, there are at least two possible explanations for this discrep- 
ancy. First, because our moving plaids appeared less coherent at low plaid angles, it is possible that 
tracking behavior is different. If eye movements preferentially track the high-contrast grating, then the 
resulting retinal motion would be biased in the direction of the low-contrast grating, thus explaining the 
reversed bias seen in figure 14(a). Furthermore, if coherence threshold varies for different subjects then 
this could explain the intersubject variability as well. Second, the high-contrast grating might alter the 
perceived direction of the the low-contrast grating (Levinson and Sekuler, 1976). This directional 
masking would act to reduce the bias and, if it were large enough, could cause a reversal of the bias. It is 
logical to assume that directional masking would be more severe for small plaid angles because the 
component directions are more nearly equal, although the intersubject variability is still puzzling. The 
issue of the effect of plaid angle remains unresolved. 

Finally, our success in salvaging the Adelson-Movshon hypothesis should not be construed as proof 
that the hypothesis is correct. Recently, Welch (1989) provided strong support for the first tenet of the 
hypothesis: that the motion of the plaid is first decomposed into the motion of the individual compo- 
nents. However, Ferrera and Wilson (1988, 1989) have found evidence that the intersection-of- 
perpendicular-constraints rule is not always used at the second stage of processing. Explaining our data 
with the revised Adelson-Movshon model should not be viewed as an endorsement of the intersection- 
of-constraints rule. It is likely that our data could be explained using a different second-stage rule. 
However, a contrast-dependent nonlinearity would still be required. 


Contrast-Dependent Effects in Motion Processing 

The contrast-dependent weighting function (fig. 10), determined by a least- squares, two-parameter 
fit to our bias data in figure 1 1, saturates at very low contrast (reaches 0.5 at 2.3 times threshold or 
below 2 % contrast). Low-contrast saturation is associated with many psychophysical phenomena involv- 
ing moving stimuli. Figure 15(a) replots f versus logio contrast (in threshold units) together with psy- 
chophysical measurements made in four different studies. As stated above, Thompson (1982) directly 
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Figure 15.- Comparison of our contrast-dependent nonlinearity with normalized contrast response 
functions found in the literature, (a) This panel replots the same function f found in figure 10 on a 
log-scale plot together with data from four different psychophysical studies. The studies looked at 
the effect of contrast on perceived grating speed (filled squares), on detection of grating displace- 
ments (open squares), on motion after-effect duration (filled circles) and initial speed (open circles), 
and on perceived rotational frequency (filled triangles), (b) This panel replots f as a function of 
absolute contrast with the data from three different physiological studies: MT, VI (squares), and 
LGN data are from Sclar et al. (1989), VI data (diamonds) are from Abrecht and Hamilton (1982), 
and retinal ganglion-cell data are from Kaplan and Shapley (1986). All of the physiological contrast 
response functions were plotted using the mean or median best-fitting hyperbolic function normal- 
ized to their response at 100% contrast. For the ganglion cells, the exponent of the hyperbolic func- 
tion was fixed at 1 . 

measured the perceived speed of gratings as a function of contrast and found that low-contrast gratings 
appear to move more slowly than a high-contrast reference moving at the same speed. The closed 
squares plot the mean perception of two subjects as the ratio of perceived speed of a test grating to that 
of a standard grating of 25% contrast, assuming a detection threshold contrast of 0.5%. A higher detec- 
tion threshold would shift the curve to the left and would therefore improve overlap with f. In 
Thompson’s study, the contrast effect appears to saturate slower, but he used a stimulus that differed 
slightly in spatial (his 2 c/d versus our 1.5 c/d) and temporal (his 2 Hz versus our 1.5 Hz) frequency, in 
mean luminance (his 32 cd/m^ versus our 100 cd/m^), and greatly in duration (his 2.5 sec versus our 
0.3 sec). In addition, from his data, it is difficult to assess the precision and possible bias associated with 
his matching technique. We assumed that a test grating of 25% matched the standard perfectly. All of 
these factors may explain the small quantitative differences between the f measured in this study and 
Thompson’s results. 

Nakayama and Silverman (1985) measured the effect of contrast on the minimum displacement of a 
sinusoidal grating that can be discriminated (leftward or rightward motion of a vertical grating). The 
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minimum displacement (in degrees of phase) necessary for discrimination decreases to 5-10° as contrast 
increases to about 3%, then remains nearly constant. The open squares in figure 15(a) plot what the 
authors called the normalized “effective contrast” of the stimulus (mean of the best-fitting hyperbolic 
functions for two subjects assuming again that threshold was 0.5%). Their effect saturates at nearly the 
same rate (reaches 0.5 at about 2.1 times threshold or around 1% contrast) as the contrast effect in this 
study. 

Keck, Palella, and Pantle (1976) studied the effect of contrast on motion after-effect (MAE) and 
found that both the duration (closed circles) and perceived initial speed (open circles) of MAE (nor- 
malized with respect to that of a 12.5% grating) saturate at low contrast. Their MAE duration data nearly 
superimpose on the f derived in this study. Their MAE speed data saturate more slowly and at nearly 
the same rate as Thompson’s data. 

Campbell and Maffei (1981) looked at the effect of contrast on perceived rotational speed and found 
that a rotating low-contrast grating patch was perceived to rotate more slowly than an otherwise identical 
high-contrast stimulus rotating at the same rate. The closed triangles in figure 15(a) plot the ratio of per- 
ceived rotational frequency of a test grating to that of a 20% contrast grating. The effect again saturates 
at low contrast, although not as rapidly as the effect in this study. 

The fact that all of these disparate psychophysical studies seem to saturate similarly at low contrast 
probably reflects a fundamental property of a shared input stage for human judgments of motion. It is 
important for contrast responses within the motion-processing system to saturate early to disambiguate 
signals related to contrast from those related to motion. The interesting finding of this and the other stud- 
ies is not that there are contrast-dependent misperceptions of motion, but actually that these mispercep- 
tions only occur at the extreme low end of the contrast scale. 

Examination of the contrast-response properties of neurons within the monkey visual cortex suggests 
that this shared input is at a higher stage than striate cortex (Albrecht and Hamilton, 1982; Sclar, 
Maunsell, and Lennie, 1989). Figure 15(b) replots f as a function of log absolute contrast together with 
the normalized mean contrast response functions of ganglion cells (triangles), lateral geniculate neurons 
(circles), and neurons within the striate (open squares and diamonds) and extrastriate visual cortex (solid 
squares) of macaque monkeys. Albrecht and Hamilton found that striate neurons (VI) show a wide 
range of contrast response functions. Some neurons begin responding at 1% contrast and saturated by 
10%. Others do not begin responding until 10% contrast. VI neurons, on average, reached 50% of their 
maximal response at 23.9% contrast, with those neurons tuned for 1.5 c/d doing so at around 20%. Simi- 
larly, Sclar and colleagues (1989) found that, on average, VI neurons reached 50% of their maximal 
response at 31.6%. They also found that neurons within the middle temporal area (MT), an area of 
extrastriate visual cortex specifically associated with motion processing (Maunsell and Van Essen, 1983; 
Rodman and Albright, 1987; DeYoe and Van Essen, 1988), saturate at much lower contrast. On average, 
MT neurons reached 50% of their maximal response at only 7.6% contrast, with some individual 
neurons reaching 50% saturation at as low as 1.6% contrast. Sclar and colleagues (1989) and Kaplan and 
Shapley (1986) measured the contrast response functions of neurons at earlier stages in the visual path- 
way, in the lateral geniculate nucleus (LGN) and retina, respectively. MT neurons have higher contrast 
sensitivities apparently because they receive a selective input from the magnocellular pathway (solid 
symbols) beginning at the retina (Kaplan and Shapley, 1986; DeYoe and Van Essen, 1988) and because 
they receive pooled information from lower-level neurons with smaller receptive fields (Sclar, Maunsell, 
and Lennie, 1989). On average, LGN neurons and ganglion cells within the magnocellular pathway 
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reached 50% of their maximal response at 9.6% and 10.4% contrast, respectively. A separate parvocellu- 
lar pathway has lower contrast sensitivity with neurons reaching 50% of their maximal response at 
36.5% contrast in the LGN and at 38.9% in the retina . 1 

The psychophysical phenomena described in figure 15(a) still saturate faster than the responses in 
MT (or responses at earlier stages in the magnocellular pathway) suggesting that either the common 
input for psychophysical judgments is from a higher cortical level or that the psychophysical judgments 
use information pooled from several MT neurons. It is also possible that the humans make psychophysi- 
cal judgments based on inputs from a selective group of MT neurons because a subset appears to satu- 
rate as fast as the psychophysics (Sclar et al., 1989). 

In neither VI nor MT does it appear that speed is encoded in the firing rates of individual neurons 
(Maunsell and Van Essen, 1983; Rodman and Albright, 1987; Orban, Kennedy, and Bullier, 1986; 
Movshon, 1975). Therefore, it is a reasonable conjecture that the speed of a moving grating is encoded 
as some integral of the collective output of an ensemble of MT neurons or in some “higher” cortical area 
that receives pooled input from MT. However, regardless of the specific scheme used to encode speed, 
at low contrasts, because neuronal activity within MT is affected by both speed and contrast, decrements 
in firing within the ensemble of neurons caused by reductions of contrast might be misinterpreted as 
reductions in stimulus speed (which could just as easily have been the cause). This would lead to the 
psychophysical findings presented here and in Thompson’s study (1982). At higher contrast, changes in 
the activity of MT neurons only reflect changes in the motion of the stimulus so the perception of speed 
is veridical. 


Implications for Models of Human Motion Processing 

The fact that contrast can systematically and dramatically distort the perceived direction of plaid 
motion puts a constraint on future models of human motion processing. As stated above, it cannot be 
explained by the schematic model in figure 1 (Adelson and Movshon, 1982), but it can be explained by 
the simple modification presented in figure 9. However, both the Adelson-Movshon and revised models 
are mere cartoons that provide an organizational structure for motion processing; they are not true 


1 When the average response (R) is given by a hyperbolic function of contrast (C), i.e., 

D f n 
_ lv max ^ 

C n + C 50 

it should be emphasized that C 50 is not the contrast value which produces half the maximum response. Because C varies 
only between 0 and 100%, R max is often the extrapolated maximum, and the true contrast value at half-maximum response is 

100 C 50 

^100 n + 2C3o 


which reduces to C 50 for C 50 « 100%. 
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models. We now examine the performance of a few, more complete models of visual motion processing 
to determine if they mimic our psychophysical findings. 

One class of models that would not exhibit the same behavior as our subjects consists of cross- 
correlation models (e.g., Leese, Novak, and Taylor, 1970). When a plaid with components of unequal 
contrast moves straight up, a pure cross-correlation technique will show no bias because cross- 
correlation determines the maximum overlap between two successive frames and overlap is perfect 
(neglecting noise) for true upward motion. Therefore, our results clearly indicate that the human visual 
system is not using a full-field, cross-correlation technique. Bulthoff, Little, and Poggio (1989) recently 
proposed a neural network implementation of a variant of the cross-correlation method. Rather than 
performing a simple 2-D cross-correlation over the whole image, it does a local cross-correlation over a 
patch. If this patch is the whole image, the model reduces to a simple cross-correlation model. In 
response to a plaid whose components are of unequal contrast, it would therefore show no bias. If the 
patch is small, as compared to the spatial frequency of the gratings, the model will exhibit a spatially 
nonuniform response: at different points within the image, it will detect either the motion of an indi- 
vidual component or of a node. It is hard to say what the exact response to a plaid whose components 
were of unequal contrast would be, as it depends critically on the patch size, but it seems likely that it 
would detect either the true direction of motion or nonrigid motion, but not the systematic biases that we 
observed empirically. 

A second class of models that one would expect to be invariant to changes in contrast comprises 
those that track the motion of the nodes by tracking the motion of edges (zero-crossings of the second 
spatial derivative) within the image (e.g., Marr and Ullman, 1981; Hildreth, 1984). The nodes in our 
stimuli always moved exactly upward, regardless of the contrast ratio and, therefore, should only 
provide information about the true direction of motion. However, it is not the determination of the edge 
velocity per se, but how the edge velocities are combined to determine the global motion of the plaid 
that is important. For example, one model (Perrone, 1989) that identifies moving edges within images 
and hence, ostensibly, tracks the moving nodes, uses a cosine- weighted voting scheme to determine the 
global motion. The Perrone model shows a directional bias toward the direction of motion of the higher- 
contrast grating in response to plaids composed of unequal contrast gratings (Perrone and Stone, 1988), 
although the biases are twice as large as those found here. The simulated bias occurs because the nodes 
change shape and become spatially asymmetric for contrast ratios different than 1 . This asymmetry 
causes a shift in the distribution of edge-velocity vectors. The voting scheme then causes a shift in the 
output of the model. Therefore, models that look at the motion of edges within the image can predict 
directional biases despite the fact that the features they are tracking are moving in the true direction of 
motion of the pattern as a whole. However, feature- based models that identify and track the nodes per se 
would not show such biases. 

A third class of motion models consists of those that work directly with the spatial and temporal 
gradient of the image intensity (e.g., Limb and Murphy, 1975; Horn and Shunk, 1981). Recently, a 
neural network implementation of this approach has been shown to respond to plaids composed of 
unequal contrast gratings with directional biases toward the direction of motion of the higher-contrast 
grating (Koch et al., 1989). The bias is caused by the contrast dependence of their first-stage neurons 
(U cells) at low contrast, although the asymmetry of the spatial gradient, when the components have 
unequal contrasts, may also contribute. However, the proposal that the U cells are located in V 1 is not 
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plausible, because the output of U cells is proportional to speed and no such units have been found in 
either V 1 or anywhere else in the primate visual cortex. 

A fourth class of models includes those that look at motion in the frequency domain (e.g., Watson 
and Ahumada, 1983, 1985), those that calculate motion energy (e.g., Adelson and Bergen, 1985), and 
the related elaborated Reichardt detectors (van Santen and Sperling, 1985). Motion-energy models are 
expected to be seriously affected by contrast manipulations because motion energy is basically propor- 
tional to the square of the contrast. To address this weakness, Heeger (1987) proposed a modified 
motion-energy model which normalizes the response by dividing the output of each sensor by the total 
energy for a given orientation. Because of this normalization, Heeger’s model determines the true direc- 
tion of motion for moving plaids with contrast ratios as high as 1:32, which is inconsistent with our 
results. Furthermore, the model without contrast normalization will yield larger biases over a wider 
range of contrasts than the biases observed here. New approaches to reduce the inherent contrast depen- 
dence of motion-energy models need to be developed. 

Watson and Ahumada (1985) designed their model of human motion processing to be robust to con- 
trast variations. They determine the direction of motion by examining the temporal frequency of the 
response of linear spatio-temporal filters, a measure which is independent of contrast. Because of this, 
the Watson-Ahumada model finds the true direction of motion for plaids with contrast ratios as high as 
1:10 which is inconsistent with our results. Their model, however, can be modified, as was the Adelson- 
Movshon model, by incorporating a contrast-dependent nonlinearity. 

Our results are therefore inconsistent with the specific versions of a number of current models of 
human motion processing. In many cases, simple modifications can be made to account for our data. 
This discussion is not meant to be an exhaustive survey of motion models, but is merely intended to 
show how our psychophysical results can be used in many cases to refine and, in some cases, to rule out 
certain models. It is also meant to show how the quantitative comparison of empirical and simulation 
data is needed for the meaningful analysis of the biological plausibility of existing models of human 
visual motion processing. 
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